-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-4163][Core][WebUI] Send the fetch failure message back to Web UI #3032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #22594 has started for PR 3032 at commit
|
Test build #22600 has started for PR 3032 at commit
|
Test build #22594 has finished for PR 3032 at commit
|
Test PASSed. |
Test build #22600 has finished for PR 3032 at commit
|
Test PASSed. |
Test build #22681 has started for PR 3032 at commit
|
Test build #22681 has finished for PR 3032 at commit
|
Test FAILed. |
extends Exception { | ||
|
||
override def getMessage: String = | ||
"Fetch failed: %s %d %d %d".format(bmAddress, shuffleId, mapId, reduceId) | ||
|
||
def toTaskEndReason: TaskEndReason = FetchFailed(bmAddress, shuffleId, mapId, reduceId) | ||
def toTaskEndReason: TaskEndReason = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may fit within 100ch on one line
LGTM, only had superficial comments. |
@aarondav thanks for reviewing it. Updated as per your comments. |
Test build #22696 has started for PR 3032 at commit
|
Test build #22696 has finished for PR 3032 at commit
|
Test PASSed. |
case class FetchResult(blockId: BlockId, size: Long, buf: ManagedBuffer) { | ||
def failed: Boolean = size == -1 | ||
if (failed) assert(buf == null) else assert(buf != null) | ||
sealed case class SuccessFetchResult(blockId: BlockId, size: Long, buf: ManagedBuffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, "sealed" is usually only applied to traits, as case classes themselves are not supposed to be extended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already replaced sealed
with private[storage]
. Thank you.
How about now? |
Test build #22753 has started for PR 3032 at commit
|
Test FAILed. |
Test build #22757 has started for PR 3032 at commit
|
Test build #22753 has finished for PR 3032 at commit
|
Test PASSed. |
Test build #22757 timed out for PR 3032 at commit |
Test FAILed. |
Jenkins, retest this please. |
Test build #22764 has started for PR 3032 at commit
|
Test build #22764 timed out for PR 3032 at commit |
Test FAILed. |
blockId match { | ||
case ShuffleBlockId(shufId, mapId, _) => | ||
val address = statuses(mapId.toInt)._1 | ||
throw new FetchFailedException(address, shufId.toInt, mapId.toInt, reduceId) | ||
throw new FetchFailedException(address, shufId.toInt, mapId.toInt, reduceId, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this patch out locally, and it appears that, sadly, the cause is actually not propagated correctly back to the driver. I think we can revert this to the prior Utils.exceptionString() way, that looks good in the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revert to the Utils.exceptionString() way.
Looks Utils.exceptionString()
does not handle the nested exceptions like this. It only outputs the outermost exception info. Is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is almost certainly not intentional. I think the method was introduced rather recently, and does not have many users right now. It should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to fix it in another PR.
Test build #22800 has started for PR 3032 at commit
|
LGTM, will merge as soon as Jenkins passes. |
Test build #22800 has finished for PR 3032 at commit
|
Test PASSed. |
Merging into master. Thanks for this! |
This is a PR to send the fetch failure message back to Web UI.


Before:
After (Please ignore the meaning of exception, I threw it in the code directly because it's hard to simulate a fetch failure):

